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ABSTRACT 

Diagnostic* using Expart systems* has become extremely popular 
nowadays. A variety of methods have been proposed by many researchers 
to develop intelligent diagnosis techniques using knowledge about the 
structure and behavior of a system. One of the many ways to represent 
this knowledge is the * Faul t Tree * representation. Though mostly used 
for reliability assessment* the fault trees represent the cause effect 
relationship between the symptoms and the causes in a structured 
manner and at the same time provide data for probabilistic analyses. 

For most practical systems* where a set of symptoms point to a large 
number of faults* an uncertainty exists about the occurrence of a 
particular faul t. Diagnosis* using fault trees* attempts to use the 
logical formulation representing the symptom - cause relationship and 
couples it with probabilistic techniques to diagnose a fault* when an 
uncertainty exists about its occurrence. 

In the present case* the fault tree has been constructed for the 
El ectr omechani cal Shutdown Rod system of CAN DU reactors and for some 
selected problems associated with the IBM PC and the search strategy 
has been tested using these fault trees. The results show that in most 
cases where occurrence of faults is guided by probability of 
occurrence of that fault* probabilistic techniques can reduce the 
number of searches required to diagnose a faul tC when some uncertainty 
exists about its occurrence!) greatly from what would have been 
required by an exhaustive approach. 

For testing the effectiveness of the diagnosis strategy in the case 
of faults associated with the el ectromechanical shutdown rod system of 
CANDU reactors* a simulated version of the system was used. The 
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si mul a/tor was developed using proper ti es or signal flow graphs. The 
■Lasts for PC fault diagnosis have been carried out on an IBM 
compatible PC/XT having known problems. An interface has also been 
developed to enable the user to develop the fault tree in an 


interactive manner. 
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CHAPTER ONE 
INTRODUCTION 

Any functional i iem will have a certain desired or required 
performance specification and a failure is said to have occurred if 
it fails to operate in the desired manner. This project attempts to 
diagnose faults using failure rates or reliability data and uses a 
MYCINC3] like uncertainty handling technique. 

A program has been developed to simulate faults for 
electromechanical systems and it can simulate faults which produce 
changes in the steady state behavior of such systems. The EMSR 
C Electromechanical Shutdown Rod} system for CANDU reactors has been 
simulated with it and its behavior has been simulated under normal 
and faulted conditions using the above program and the system 
response has been used in fault diagnosis using the reliability 
data available for the CANDU EMSR system through the diagnosis 
software. The diagnosis strategy has also been applied to some 
selected problems associated with IBM personal computers. 


1.1 FAULT AND FAULT TYPES 

Failure modes for most systems » electronic systems in particular 
can be classified into two broad categories!! £D : - 

Ca} Through drift failures: - Due to gradual degradation faults occur 
when a certain threshold value is crossed. A typical example is the 
longevity problem of a rechargable battery* where the internal 
resistance increases in direct proportion to the time the load is 



applied and recovers at- a rat-e equal t-o one half of this value. For 
such a system, a failure is said to have occurred when the internal 
resistance change crosses a value of 4.00 ohms. 

C bD Chance failure or catastrophic failure: - Such a failure occurs 
at random within the operational time of an equipment and before 
wear becomes pre-dominant . Typical examples are shorted electron 
tubes and wire breakages. In this case, the lifetime of the component 
may be represented by a random variable T,for which the survival 
function vCtM13 is defined as: - 

vCt5 = p <T> t> 

r 

where p<T>t> is the probability that a component has a failure at a 
time after t. 

If the above hypothesis is accepted , then the probability that a 
component has a failure at a time earlier than t is: - 

p Ct-XT < t> = 1 - t>CtD = «stt:> 

is called the distribution function and vCtZ> is called.the 
complementary distribution function. 

Survival functions are broadly classified into three types as 
followsLI 3 : - 

Ca2> Type I. the random variable T representing the lifetime of a 
component is a discrete variable. The survival function is a step 
function. 

Cb2>Type Ila. The random variable T takes its values in to*OZ) and 
its distribution function tl) and survival function i>C tD are piece 
wise continuous for all t and tZ) and v>CtZ) admit a der i vati ve in 
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the interval where they are defined. 

CcDType lib. The random variable T takes its values in [<*>03 and 
the corresponding distribution function 4CO and the survival 
function vCtZ> are piece wise continuous and make at least one jump. 

The survival Law defined by a Rate Deterioration is applicable in 
case 11a and is mostly used. 

1.2 DIAGNOSIS: DETERMINISTIC & NON DETERMINISTIC MODELS 

CaD Deterministic model 

Let S s Es^.s^.s^ s^3 represent a finite set of symptoms. 

Let F = Cf »f ,f f ] represent a finite set of faults. 

jl 2 a ™ 

If each of the faults f > f >f * f produce sets of 

t 2 S 4 m 

observable s £ t sf »sf > sf where each sf.e £> and there exist a 

1 2 3 Tn b 

mapping function QCfi for each f. which maps the corresponding sf^ 
onto F, then each of the faults can be uniquely determined and the 
model is deterministic. 

CbDNon-deter mini stic model 

All information needed to diagnose a fault is rarely available. A 
variety of techniques for handling incomplete information within 
computer programs have been proposed. Some examples of such 
reasoning techniquest 43 are as follows: - 

CiD Non— mono tonic reasoning: Here conclusions are defusible and so 
can be deleted in the on-going reasoning to prove or disprove a 


parti cul ar di agnosi s . 



CiiDProbabilistic reasoning: This "technique is used to represent 
likely but uncertain inferences. 

CiiiZ>Techniques using concept of belief spaces which allow 
representation of nested models of sets of beliefs. 


1.3 REASONING BASED DIAGNOSIS TECHNIQUES. 

Most reasoning based diagnosis techniques use the Fault-tree 
method. The fault-tree reconstructs the fault from the failure of 
basic components using binary logic. The essential steps involved in 
a fault tree construct! ont 53 are: - 

CiDDefinition of the TOP event. According to standard fault-tree 
terminology, the TOP event is the event whose probability of 
occurrence is desired. The definition of TOP event is in terms of 
system hardware. For diagnosis purposes, the TOP event is the root of 
the AND-OR tree representing the fault-tree. 

C i iZ) Definition of a set of INTERMEDIATE EVENTS. According to 
fault-tree terminology, INTERMEDIATE EVENTS are those events which 
can cause TOP event either individually, or in some mixed 
mode. Their relation to the TOP event is defined by boolean 
operators AND and OR. For diagnosis purposes, these are used as the 
symptoms . 

CiiiDDef inition of a set of BASIC EVENTS. Each intermediate event is 
to be treated as TOP events and stepCiiZ) is repeated till a set of 
BASIC EVENTS is obtained. These are events defining failures of such 
components which needs no further development for analysis 
purposes. Basic events are the leaves of the fault— tree and 



reliability or failure data for such events is either available or 
it has to be collected for these nodes. 

Civ} Editing the Fault-tree so constructed to avoid 
omissions * logical discrepancies and repetitions. 

It is to be noted that fault -trees for components which fail in 
more than one mode are separately constructed and substituted at 
appropriate positions in the system fault-tree. 

The fault-tree method is a versatile method for analyzing complex 
systems. Some of the advantages of fault-tree analysis are: - 
CaZ> Directing the analyst to diagnose failures in a deductive way. 
CbDPointing out the important aspects for the failure of interest. 

CcD Providing options for qualitative or quantitative system 
reliability analysis. 

CdD All owing the analyst to concentrate on a particular system at a 
time. 

Table. 1.4 is a r epresentation of a few noteworthy programs 
developed in the last twenty years for fault— tree analysis. They 
have been categorized into a few groups. Group one consists of 
fault-tree construction programs. Groups two*three and four are the 
analysis programs. The analysis programs may be divided into two 
general types viz. those which directly produce numerical 
resultsC group four I) and those which first qualitatively and then 
quantitative! yC group three} analyze the logic system. 
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Table 1.1 

A few noteworthy computer programs related to fault-trees 

C13 Fault Tree Construction programs 
Dr af tC Fuss el 1 , 1 0731) 

C2D Programs finding minimal cut setst7] 

Pr ept Vesel y , 1 070] 

C33 Numerical evaluation programs 
KI TT1 . KI TT2C Ves <t\y ,107 4Z> 

C4J Direct Evaluation programs 
SAMPLEC Wash 1400} 

1.4 SURVEY OF EXISTING LITERATURE. 

Diagnosis as a problem has been dealt with in a number of ways 
and in a wide variety of realms ranging from medical diagnosis to 
trouble-shooting in personal computers C 11 ]. A generalized approach 
has been put forward by Raymond Reiter, Johan de Kleer and Daniel 
G. BobrowC 103 . In the method of diagnosis suggested by them the 
system has to be described in a suitable logicC prefer ably First 
OrderO and uses nonmonotonic reasoning techni ques . The method 
accommodates diagnostic reasoning in a wide variety of practical 
settings , including digital and analog circuits, medicine and 
database updates. 

Some useful diagnosis techniques are available in the field of 
medical di agnosi s . One such program is the MYCINE33 using inexact 
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reasoning in "the domain of* uncertain knowledge. Another example is 
the TEI RESI ASC 4 ] which provides a way Tor doctors to interact with 
MYCIN. 

Diagnosis has also been attempted using system description in a 
variety of ways. Accordingly * a large number of programs have been 
developed which are dedicate to representing the system 
description. For example DR AFT [ 63 is a fault-tree construction 
program which proposes the component failure transfer function 
approach. The component failure transfer functions are obtained from 
a system independent analysis of every component appearing in the 
system for which the fault-tree is to be constructed. However * most 
of the programs associated with fault-trees deal with 

quantif i cation of the fault tree for reliability estimation 
purposes and they are mostly based on Vesely* s Kinetic Tree 
Theory. For diagnosis purposes „ the fault-tree is treated as an 
AND-OR tree and can be used for diagnosis purposes. One such attempt 
has been made by R . Davi s l & 3 wher e a di agnos tic met hod us i ng 
structure and behavior has been put forward and J de KleerCQ] in an 
attempt to diagnose multiple faults. 

1.5. AN OVERVIEW OF THE DIAGNOSIS METHOD PROPOSED 

In the present case » diagnosis is done by traversing the 
fault-tree of the system which is an AND-OR search tree starting at 
the TOP event. The search is a three tier process. Using deductive 
reasoning* the set of faults which have definitely occurred and the 
set of faults which have definitely not occurred are first 
identified. The remaining sub-set of the set of all possible faults 



const i 'turbos the set of faults about which uncertainty exists. The 
uncertainty is resolved using probabilistic techniques. For this 
pur pose » probability of occurrence of a particular fault is computed 
using reliability data and the technique used is a modification of 
the medical diagnosis program MYCIM. The history of occurrence of a 
particular fault is also considered for calculating individual 
fault probabilities. A set of most probable faults is thus 
i dent if ied. 

The third step in the traversal is based on user interrogations. A 
guided search is made over the set of most probable faults using 
deduct! ons made earlier and also in the course of this search to 
determine which of the faults have actually occurred. This step is 
also repeated over the set of less probable faults if required. 

Fault-trees were constructed for the EMSRC Electromechanical 
Shutdown RodD system of the CAMDU reactors and also for some 
aspects of the IBM personal Computer viz. problems related to Start 
up, Read! ng/Writing» Keyboard problems and the search strategy has 


been tested with these. 
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CHAPTER TWO 
SIMULATION OF FAULTS 

The simulator has been developed to study the steady-state 
behavior of faulty system and use such behavior for testing the 
diagnosis algorithm.lt uses concept of transfer function and makes 
quasi steady state approximations to obtain values of outputs at 
intermediate time intervals. The formulation is as follows. 

2.1 QUASI STEADY STATE APPROXIMATIONS FOR DYNAMIC INPUTS AND THEIR 
VALIDITY. 

If a system can be described by a transfer function of the form 


a s 


n-i 

a s 


a S 
n-2 


n-2 


4-a 

o 


GCsD * 


, ™ m -4. , , to-2 , , , 

b S 4* b S 4* b S 4* 4-fcj 

m tyv-2 O 


where a 0»b ^ 0> then the output of such & system YCt) f at 

o o 1 

steady state due to an input XCt) will be as follows: — 


Cased) 


XCt) = AU ^Ct) Cstep of height A) 


YCO = L' 4 CA/s) . GCaD and 



YCO | 


lim e. CA^s^.GCb^ 


A. Cm. ✓fa } 
o o 


i*»Ol 


o 


Cased!) 

XC t) = AU 12) i . & a ramp of si ope A 

YC-bU = L _i CA/s*5. GCbZ). 

YCt,D I = lim b. C A/ss a> . GC&3 = oi 

* t*Ol 

•4 O 

cased ii) 

xct:> = au ct) 

-a 

YCO = L -1 C2. A/sh. GCsI> 

YCOl = lim s. C2. A/s*) . GCsD = <a. 

* t*(Qt 

S-fr O 

It, is -thus clearly saan that the steady state value? at, t = ca for 
ramp and parabolic inputs to the above* systam is infinity but the 
output has a finite value at any intermediate time Instant and 
these values are computable using inverse Laplace transforms. Since 
inversion from *s* domain to * t * domain is a complicated process* a 
few approximations are to be made in cases di) and dii) which are 


as follows: - 
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CMeCi) 

XCO = AU CtO 
-a 

XCs3 = A/s 

YCsD = CA/sO.GCb!) = A<P /s + P CO/Cb s m + b s™ - * + +b D>..Ci:> 

4 2m m-4 O 

where is a constant and P^C si) is any polynomial in s. 

The 'term P^C si) may be broken down f ur ther into many more partial 
fractions and its inverse will produce the transient part of YCt2>. 
Concentrating on equationCiD and equating the coefficients of the 
corresponding powers of *s* *we have* 

P Cb s™ + b s m 1 + b ) = ACa s n + a + 4-a D 

4 m m~4 O r» rt— 1 O 

Equating powers of *s* from both the sides* we have* 

P b = Aa 

4 0 O 

or * P = AC a Sb D 

4 6 O 


or.YCfiD = AC a. /b 3.Cl/s5 + P Cs5/Cb a"* + b s m “‘ + b 3 

O O 2 m m— 4 O 


After the initial phase > when transients die down YCtZ) = AC a /b } 

o o 

describes the behavior of the output. 



caseC i i 3 


XCtD = AU CO 


XCsO = A/s® 


YCs!) = CA/s*!). GCs!) 


or , YCsD = P /s + P /s 2 + P CsZ>/Cb s m + b s* -1 + +b 3 

AS S TY> 1Yk-*A O 

Following similar analysisCas in *the previous case) we have* 


P b = Aa 

AO O 


or * P = AC a /b Z) 

A O O 

again* P b + P b = Aa 

AO ft A A 

or * P — A. Cab — a b Z> /b a 

A AO O A O 

Afber the initial transients die out* the output YC tD can be 

ft 

described by the aquaii on YCO = Cab — a b Z)/b + C a /b Z> . t, 

i O O i o o o 

It can be seen clearly that In this case ► there is a constant 
errorC difference between the actual and approximated outputs!) and 
this is equal in magnitude to P^ if YCO is approximated by the 
relation YCO = P .t.The ratio P /P . t » therefore , defines the ratio 

2 t 2 

of error to the approximated outputs for a ramp input to the 
aforesaid system.lt is clearly seen that as the value of *t* 
increases, this ratio decreases. 
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caseCllil 

XCO = AU CO 
-a 

XCb 5 = 2. A/ss S 

YCs 3 =XCb5.GCs5 = P + P /b + P ^b 2 + P CsD/Cb s’" + b e ”' 1 + +b D 

a a a «* m w>-a o 

where P^*P^ and P^ are constants and P^CsZ) is a polynomial in s. 

Following similar analysis* as in the previous two cases* we have* 

P = C 2Aa h — 2A& b b — 2Aa + 2Aa b b 2) sC b 2> 3 

1 20 020 Ol i 1 O O 

P . 2A. Ca b - ab D/Cb 3 2 

2 O 4 AO O 

P = 2A. Ca /b D 
o o 

Thus after the initial transients die out* the output can be 

2 

described by the equation YC tD = 2ACa^/b^2>t with an error equal to 
P + P t.It is evident»that unlike the previous case » the error does 

A ft 

not remain constant but varies linearly with time. But the output 

ft 

being a function of t and the error being a linear function of 
time* the ratio of error to the approximated output decreases with 
time. 
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From the above discussions it follows that for any general input of 
the form XCtD = AU Ct) , the output can be approximated by the 

— rv 

relation YCtD = Cn-lD!.Ca /b DXCO Ca and b has already been 

o o o o 

defined) with an error which depends on the type of the 
input. Moreover .the transient parts being neglected in all cases » the 
approximations are valid for all t> t .where t is the settling 

A S 

time for the system. But since simulation for normal and faulted 
states is done 

in the same frame of reference using the same set of approximations 
there will be sufficient qualitative difference between the normal 
and faulted states to differentiate between the two states. 

2. 2. SIMULATION BASED ON SIGNAL FLOW GRAPH MODEL. 

For simulating systems with signal flow graphs, we use the 

transmission, addition and multiplication properties of signal flow 

graphs. The numbering of the nodes follows standard 

convention. Thus, if n and n are two nodes in the signal flow graph 

<• j 

representing two variables x ,x respect! velyCx. ,x. may be 

L J ^ J 

functions of ti me » complex frequency or any other quantity} » and if 
they are linked by a branch L directed from n^ to n^then the link 
or branch transmittance for the link L will be denoted by A.. The 
input node is always designated as n^. 

Using the transmission property and the multiplication properties 
of sigmal flow graphs „ a cascaded series connection of Cn - 1Z> 

branches with transmittances described by A >A » A 

±Z ZB <n~±>n 

can be replaced by a single branch of transmission function equal 
to the product of the original Cn - ID branches and 
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X * A » A A X whftre X is the variable associ ated 

r. 48 <r»— 4>r» A 4 

at node 1 and X is the variable associated with node n. This 
property holds good for any two nodes n and m with only series 
cascaded branches between them. 

However * when feedback and feed forward branches are also 

present* the series multiplication property cannot be used 

directly. Feed forward branches are described by transmission 

functions of the form | j > i and j i + 1 * and feedback paths 

are described by transmission functions of the form A j i > j . 1 1 is 

M 1 

to be noted that a branch connecting any two consecutive nodes i 
and i +1 is a feed forward path* but we restrict our discussions to 
the classical definition of feed -for ward paths. 

In the following section*an algorithm is presented to obtain an 
expression of the form 

IV- A 

X = TJ A . X 

m 1 1 v<v+A> r» 

where feedback and feed forward paths are contained between the 
nodes n and m. 

It is clearly seen that the value of can be obtained easily 

by putting s = 0 in the expression for GCs2> . Therefore * for all 

practical computing purposes » val ue of the quantity a^/b^ be 

obtained by treating transmission functions as real numbers rather 
than functions of complex frequency and the variables associated 
with the nodes can be treated as functions of time. The 
multiplication property is then to be applied either directly or 
after simplif ication using the algorithm proposed in the following 


section. 
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2. 3. TACKLING FEED-BACK AND FEED-FORWARD PATHS- A HIERARCHY BASED 
METHOD 

Branches described by transmission functions of the form A |j 

1.J 1 

i+1 either feed-back or f eed-f orward branches. The signal flow 
graph may contain only feed-back branches or only feed-forward 
branches or both in addition to branches describable with 
transmission functions of the form A . J j = i + 1 . . Thus there may be 
three cases which are as follows: - 

cased) Only feed-back branches are present 

Let P be an ordered set of all feed-back branches. The ordering is 
done according to the following rule: - 


#1 


If A . and A. are any two 
i. j v j 

✓ £ v £ 22 


members of P then A succeeds 


A . if CaZ> j >j or Cb2> i <i | j = j . 

V j £ ^ 2 £ 2 1 £ 2 

£ A 


It is to be noted that condition CbZ) is merely a matter of 
convention. In such a case* the order in which the two entries are 
placed makes no difference when only feed-back branches are 
present . However > it is clearly seen from condition CbZ> that two 
feedback branches A. . and A. . having j - j affect the 

v j v j £ a 

a'V a^a 

same series branch viz.A^ where = j . 
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If more than one A . ,«s P affect the same branch A>then the value 
of the transmission function A after the branch is processed is 
given by 


1 


C A 2) = 

1 A 


C A 2> CA 3 

m-i i j-*>j o 

where CA . D is the original value of the transmission 

function before linearization and A is defined by the following 
relation: - 

i-i 


A. 


Cn A 2> . A . 

1 1 k<k+±> lj 


In calculating the value of A., the most recent values of all the 


transmission functions involved are used. 


cased ID Only feed -for ward branches are present. 

Let Q be an ordered set of all feed-forward branches. The ordering 
is done according to the following hierarchy: - 



* w 


*2 

If A. a. and A are two successive members of Q»then A. . 

V J * J L J 

A i ft ft ft v ft 

succeeds A. . if CsDj < j or CbD i < i I j = j . 

L J 4 ft 4 2^4 U ft 

i v i 

Again * the condition C bD is a matter of convention and need not be 
followed strictly when only feed forward branches are present in 
addition to the series branches. 

Again *if more than one A. e Q modify a single series branch l 
then the modified value of the transmission function of that series 

branch after the such feed-forward branch is processed is 

defined by the following relation: - 

CA D = CA ) + CA 3/A 

<J-4>j m <j-4>j m-4 uj 

j-ft 

where A = Cn A . 3 . 

ij 1 1 k*k+4> 

k = L 

For m = O the value of A .is the original value of the 

<J-4>J 

series branch transmi ttance before linearization. In calculating the 

value of A the most recent values of transmi ttances of the 

vj 

branches involved are used. 

casetii±> When both feed-back and feed-forward branches are 
present. 

In this case the processing algorithm is slightly complex. A few 


<j-43 


definitions are proposed for tackling such operations. 
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C *2 Containment of branches. A feed-back branch A is said to be 

V* 

contained by another feed-forward branch A if i < J and 

Va 

J* > ‘ 


C b) Operation FB. If P be an ordered set at feed-back branches and if 
Q be an ordered set of feed -for ward branches* "then Opera’Ll on FB 
consists of application of operations formulated in cased) using 
values modified by members of P only in the calculation each A... 

C cooperation FF. If G be an ordered set of feed-forward 
branches* then operation FF consists of application of operations 
formulated in caseCiiO to all members of 0- However* in calculating 
the value of A for a branch A e Q*the following steps are 
involved: - 


CiO Demarcation of another ordered set P # £ P* each member of P* being 
contained by the feed -for ward branch in question. 

CiiZ> Application of operation FB to the set P # to obtain modified 
values. In calculating the value of the modified values obtained 
through this step and those obtained in course of operation FF only 
are to be used. 

Finally* if m number of elements in P and m # number of elements in 


Q modify a particular series branch A . , » then the final modified 

value of the branch transmittance A . will be 


CA. J> # . , 

fi-TVol 


where 


CA. .3 . 

m* 


* CA. .Z> sCA .2> is 
m o 


obtained through operation FFand C 

fft 

obtai n*d through operation FB. 
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EXAMPLE 


Let us consider the signal flow graph shown in fig. 2.1 



Fig* 2.1. A signal flow graph showing feed-forward and feed-back 
branches* 

For the above graph* the feed-back branches are A and A and 

applying rule , the order in which they are to be processed is: - 

CiDA and CiiDA . 
as <m 

For the above graph, the feed— forward branches are A and A and 

applying rule #£ , the order in which they are to be processed is: - 


Ci5A and CiiDA . 
a? « 
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Operation FB gives: - 


C A D 

45 4 


^5^1 - AD where A = A .A .A .A 

* B * * 56 6? ?a 85 


CA 5 

84 4 


CA DC1 - A 5/Cl - A -AD where A = A .A .A 

*•* ^ 4 2 2 46 54 <!7 


Oper at i on FF gi ves : — 


First feed-forward branch is A 


s? 


It contains the feedback path A 

<S4 

Applying operation FB on C A 3 , we have, 

<S4 

CAD = A +AC1- A D/CA .A .A D 

<57 1 67 a? 2 84 45 5<S 

The second feed-forward branch is A and it also modifies A 

47 <S7 


So, we have. 


CA D = A +AC1-A D/A .A .A 

<57 2 6? 37 2 84 45 56 

Us i ng modi f i ed val ues 


A /A . A 

47 45 56 


X = 4 A .A .A .A .A .A .A .A / Cl - A - AD 

» 1 42 29 84 45 56 6? 78 86 ' 4 2 

+ A .A .A .A .A .Cl -AD /Cl -A -AD 

42 29 97 78 86 2 ' 4 2 

+ A .A .A .A .A .A /Cl - A - A D V. X 

48 2S 84 47 78 * 86 * 4 2*4 

which is precisely the result which Mason's gain formula for signal 
flow graph gives. 

The above formulation has been experimented on a large number of 
graphs. However ,it is quite possible that some configuration exists 
for which it fails. One disadvantage of this is the inability to 
tackle self loops. One great advantage of it is the ability to 
compute intermediate nodal values in eases where feed-back and/or 
feed-forward branches are also present. 



values, has been used advantageously for fault simulation. Since the 
system configuration is fixed, a fault corresponds to a change in 
the transmittance of one or more branches. The change in 
intermediate values and the terminal value serves as an indication 
of a fault when they deviate from the normal. 

2. 4. SOME COMPONENTS AND THEIR REPRESENTATION AS SUB-GRAPHS 


Some of -the common mechanical and electrical components mostly 
required for electromechanical systems have been modeled as 
cub graphs. These sub graphs when connected as required build up the 
signal flow graph for the entire system. However * the menu of such 
components has been purposefully kept confined to a few selected 
components only as this simulator was primarily developed to 
simulate the electro mechanical shut-down rod system for the CAM DU 
reactors which required a few selected components. The respective 
components and the sub graphs corresponding to each is as follows: - 
t±3 Motor. The motor modeled here is a dc servo motor in the 
armature controlled mode with constant field excitation. The 
variable associated with the first node is the input voltage v . The 
second node represents the voltage across the armature after 
considering the feed-back due to the back e. m. f of the motor. The 
third node represents the armature current . The fourth node 
represents the motor torque and the fourth represents the motor 
speed. The branch transmi ttances are as f ol lowsC neglecting the *s f 
ter ms D : - 

A = 1.0 

A = 1 /r where r is the armature resistance 

8ft a «l 

A = K where K is the motor torque constant 



* 

= where F is the f r i cti on f actor . 

A * 1.0 
&<s 

A__ ~ "”K Cthis is also the e. m. f constants 

B* IV) 

Note: * The value of" the effective friction factor referred to the 
motror shaft can be obtained at steady state knowing the operating 
conditions of the motor. 

Cii> Rotating parts like gears and pulleys. Each rotating part like 

an assembly of two gears or an assembly of a driver and a driven 

pulley is modeled as a three node sub graph. All the four nodes 

represent speed. The branch transmi ttances are as follows.-- 

A ~ 9 vhere g is the ratio of driven shaft speed 

and driving shaft speed. 

A = 1.0 
22 

For simulating faults in the gear /pul ley assembly* the value of g 
is altered and for simulating faults associated with the driven 
shaft only* the value of A^ is altered. For pulleys it is further- 
assumed that no slipping occurs. 

Ciii} Drum. A drum is modeled as a two node sub graph with the 

transmission function A =1.0. 

12 

Civ> Controllers. All controllers are modeled as proportional 

devices. Each controller is represented by a two node sub graph. The 

first node and the second node both represent vol tages . Vol tage 

signal can be fed into the first node from any other node in the 

system signal flow graph. The branch transmi ttances A^ = where 

K is the controller gain, 
p 

Cv> Potentiometers. A potentiometer is modeled as a two node sub 
graph with the transmi ttance A = K* where K is the potentiometer 


ratio. 
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C vDNC and HO contactors. A normally closed NC contactor is 
represented by a two-node sub graph with A =1 and a normally 

i.2 

open contactor is represented by a two-node sub graph having A^ = 
O initially. The change in the value of the transmi ttanees for 
contactors are brought about by the relay which acts on that 
contactor . Rel ays are modeled in the same ways as faults. A relay or 
a fault keeps record of two things viz. the node in the graph which 
they affect and the time instantCw. r . t the simulator clock} at 
which they occur. The simulator also permits simulation of processes 
involving timed operation of relays. 

Cvii>Triggers. A typical example of this is a comparator or a 
magnetic clutch. This is also modeled as a two node sub graph. The 
transmittance A^ swings from one predetermined value to another 
predetermined value as soon as the value of the variable associated 
with the first node crosses a certain predetermined -threshold. 

Feedback and feed-forward paths can only be modelled as scalers 
or potent! omet er s . 

It is t-o be noted that the numbering of nodes for each sub graph 
mentioned above is being done w. r.t to the sub graph only. Fig. 2. JL 
shows the sub graph associated with a motor as an example. 




2. 5. SIMULATION OF EMSR SYSTEM FOR CAMDU REACTORS. 


The reactor shutdown mechanism for CANDU reactors consists of two 
shutdown systems vi 2 . the EMSRC Electro mechanical shutdown rod!) and 
the LPRC Li quid poison shutdown rodD mechani sms . After an abnormality 
worthy of reactor shutdown is detected by the monitoring 
instruments, the shutdown rod is expected to insert in the core 
completely in 2 seconds including the time required to communicate 
failure through instrument channels. 

Fig. 4.1 shows the schematic representation of a single EMSR. From 
reactor physics considerations 12 such rods are required to 
introduce sufficient poison into the core in order to stop the 
fission chain reaction. There are 14 EMSRs triggered by 7 pairs of 
scram signal circuits. 

The operation of a single EMSR is as follows. On receiving the 
scram signal, the rotating magnetic clutch MCR under the force of 
the release spring SR is disengaged. Under the weight of the 
shutdown rod, the cable on the drum unwinds freely and is guided by 
the pulleys P and P^ and the guide tube. The fall of the rod is 
impeded by the friction between the gears ^between shafts and 
bearings and viscous drag due to moderator. To overcome the effect 
of this impeding factors, acceleration spring SPR1 is provided. Under 
normal condition the rod is pulled against SPR1 so that on release 
a very high initial velocity is achieved. At the end of the journey, 
the rod comes to rest on a support SP and to avoid vibrations i 
second spring SPR2 is provided. 

The EMSR system is said to operate successfully, if 12 out of 1 - 
rods reach the end support SP in guide tube within 2 secs. Eecaus 
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rods rftach the ond support, SP in gui de tube within 3 secs. Because 
■the rods are triggered in pair by 7 scram signal circuits failure 
of two or more instrument channel out of 7 or failure of' 3 out of 
14 rods to insert completely in the core is considered as EMSR 
failure. 

However > there is one very vital point which is to be accounted 
for. The spring is an active device. So far in the simulation 
algorithm it is assumed that the driving function is always applied 
at the input node. In this case * as soon as the magnetic clutch 
r el eases * the input terminal of the spring replaces input node and 
acts as the pseudo input node. A ramp approximation is made for the 
motion of the rod through the moderator . However * the actual 
trajectory of the moderator is something quite different. 

For simulation purposes*the signal flow graph of the system apart 
from the spring loaded shutdown rod is constructed using the 
sub graphs for each and the spring-loaded shutdown rod is treated as 
a separate entity as a whole. The input terminal of the spring * i s 
connected to some rotating member * in this case the pulley. It 
is to be noted that the angular displacement of the rotating member 
depends on the vertical displacement of the rod and so the 
magnitude of the pseudo input node driving function can be 
computed. 

Let X CtD be the nodal value at the spring output and let X.ttZ) = 
i t 

K«t rather than K<5Ct2>. 

Let XCD be the displacement of the rod from the initial 

j 

position. For the present approximation X CtZ> = X CtZ). 

v 

Let X Ct3 = RX CtD be the reactivity produced in milli K by the 
k j 

displacement XCtD at any instant t.R is assumed to be constant. 
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It/ is to b© noted that the X^CtD is negative for a positive 

XCtD.XCtZ) is positive Tor motion of the shutdown rod into the 

J J 

core. 

If *L* be the displacement of the rod at t = t .where t is the 

t t 

time of fall required and o be the value of X CO at t = t .we can 

* k f 

define 6 X CtD = L/t and 6 X Ct,} = a /t . 

j f k *r f 

Approximating X Ct D = X.Ct. } + 6 X Ct} where X Ct } = L*we 

J *• J L-i j j n 

haveX Ct } = X Ct, 3 + 6 X Ct} > and X Ct ) = p n beimg the 

k V k t-4 k k n 55 

number of time steps considered and X Ct } = X Ct ) = O. 

j o k o 

Considering a single group of delayed neutron precursor only * the 
neutron population at any time t following a step change in 
reactivity p> from steady state » is given by the equation 
nCtD = n [ A.expCatt} ~ B.expCott}] where A»B>a and o are 

0 4 a 4 2 

defined in the following way: - 

A = ft St ft - pi); 

B = psCfi - p!>; 

Oi^ = Xp/C/9 - p!> ; 

= Cp - /»/A; 

and 

ft + fraction of delayed neutrons produced during fission 
compared to the total C del ayed + prompt!) . Typically* ft — O. 0065. 
p magnitude of the step change in reactivity. 

X mean ecay rate of the delayed neutron pr ecur or s, typically 
X = 0. OS see -4 . 

A •* neutron production time. typically 0.1 ms. 

4 neutron population at steady state. 

The reactivity change during the shutdown process can be 



considered as a step change if the Li me taken by the rod to reach 

the end support is very small. Since the time is 

appreciableC approximately 2 sec. I) this step approximation is not 
valid. The reactivity* however remains constant after the rod 
reaches the end support and rests there. The actual variation of the 
reactivity during shutdown is governed by the motion of the rod 
itself. For the sake of our convenience* the reactivity change has 

been approximated as a ramp function during the journey of the rod 

through the moderator. To calculate the neutron populations at 
different time instants during the journey of the rod through the 
moderator * the folowing formula is applied:- 
nCt 3 = nCt Z> . t A' . expCot ' tZ> - B # . expCot ' t2> 3 

v. v-i 1 2 

where A' = ftSCfi - X Ct 

B # = X Ct - X Ct)) 

k i ' k v 

ot ' « X.X Ct - XCt)) 

± k \. k i» 

o' = CX Ct) - A 

2 k i 

The above formulation is not strictly valid since* the momenta 
step change in reactivity is introduced into a reactor at steady 
state* a prompt jump in neutron population occurs. But in the 
present case » the reactivity changr is on the negative side. So for 
small time step and small value of time step this approximation 
gives fairly satisfactory results. Moreover * since both the normal 
and the simulated responses are calculated under the same 
appr oxi mat i ons * t her e is sufficient qualitative difference between 
the response in the normal state and that in the faulty state to 
enable detection of faults. The values of p f **l* and *t f * are to be 
provided by the user when the nuclear interface is built. 
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CHAPTER THREE 
THE DIAGNOSIS ALGORITHM 

Diagnosis is based on a search of the system fault-tree 
which is represented by an AND-OR tree and is entered by the 
user in the form of a set of Boolean equations each 
representing the causal relationship between a fault and its 
causes- The entries are in the Sum of Terms form where the left 
hand side represents some intermediate event C symptom!) and the 
right hand side is an expression involving disjunction of a 
number of terms > each term being an event or a conjunction of 
two or more events. The traversal algorithm is a three tier 
process starting with the TOP event as the root and gives as 
its output the set of faults which have occurred. The first step 
is based on purely deductive reasoning. The second step involves 
MYCIN like uncertainty handling techniques to demarcate a set 
of most probable faults from the set of all possible faults 
about which some ambiguity e>ci st . The third step is based on 
user interrogations and is used to identify the faults which 
have definitely occurred. 

3.1 KNOWLEDGE REPRESENTATION 

The knowledge is represented in the form of an AND-OR 
tree. The TOP event is the root of the tree. Each node in the 
tree has at least two attributes associated with it dependi ng 
on whether it represents an INTERMEDIATE EVENT or a BASIC 
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EVENT. These are: - 

C al> I NSTRUMENTATT ON STATUS IT it is an i nter medi ate event, . 1 t 
denotes whether or not there is an instrumented fault indicator 
for the intermediate event in question. If there is an 
instrumented fault indicator .then it is possible to know 
definitely the state of the intermediate event in questi on . el se 
it may or may not be possible to check for the state of the 
event in question. 

Cb3 PROBABILITY OF OCCURRENCE if it is a basic event. This 
attribute is not static for a node and has to be computed. 

C c3 OPEN or CLOSED attribute to denote whether the node has to 
be c hec k edC when it is still OPEN!) or notCwhen it is CLOSED^ . A 
node is said to be OPEN till no query has been made about the 
status of the event represented by it. 

Thus. for two nodes i and j representing two events S. and S^ t 
respectively and with a link L. .the direction being from node 
i to node j . the semantic significance is: - 

if S occurs then there is a probability P . that S. is the 
i J 

cause for it. 

This is similar to MYCIN rules. But a set of static rules 
cannot be used in this case as the aforesaid probability P 
is dynamic and has to be normalized over the set of possible 
faults having some ambiguity since the probability of 
occurrence of a fault depends on the equivalent failure rate of 
the components associated with it .the time of operation.and of 
course the conditions associated with it can vary widely. Hence 
nor mal i zati on over a finite domain is necessary. Another 
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advantage of this representation, which is quite obvious 
front the first, is the ability to use the same graph repeatedly 
and computing the probability values as and when required 
rather than forming a fresh set of rules each time as required 
by a static set of representations like MYCIN.For exampl e . even 
after a certain fault has occurred, a static set of rules will 
assign the same probability of occurrence to it for the same 
set of symptoms .though in real life the probability of 
occurrence of the fault in the near future is much lower. In 
such a case, in order to overcome this problem.a fresh set of 
probabilities are to be assigned to each fault and a new set of 
rules is to be formulated. 

However. for implementation is done by predicates to store 
information about causal relationships between the faults and 
the symptoms. 

Let T TOP event; 

Let S = Ls, s, s s ] denote the set of all 

12 3 r> 

intermediate events. 

Let E = Ce, e. e e 3 denote the set of all 

t 2 a th 

possible basic events. 

In the light of the above discussions .the following 
predicates are defined: - 

CiD can occ frstCs ,sE3 where sis any intermediate event and 
— — i l 1 - 

sE. «= pC El) where pCEZ) is the power set of E. 

C i i 2) possi bl e_f r st _i f C E^ , sSO where is any basic event and 

sS^ « pCS>,pC£2) being a power set of S. 

The semantic significance of the above two predicates is 
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brought, out, by the following rules: — 

Cllcan_occ_frfitCs ,sE ^loccuraCa 55 3 c"*cjccursCe D I V e in 

L 1 v J 1 J 

sE } 

L 

The rule states ■that if a certain intermediate event is not 
seen then none of the basic events causing it could have 


occurred. 


C SD possi bl e_f r st_i f C e »s£ !) A C occur sCe 3 C occur sCs 2> I Vs in 

L ^ J 1 J 

sS } 

t. 

The second rule states that if any basic event occurs * then 


all the intermediate events caused by it occur. 


As an example » let us consider the following formulations 


defining the cause effect relationship between a set of 


symptomsC inter mediate events) and s^ and a set of 

causesC basic events!) e »e »e >e and e . 

i 2 a * 5 


Exampl el. 


s = e + e + e 
a £ st a 


s = e + e 
2 a 4 

s = e + e 
a i 25 

The corresponding predicates will be: - 


can occ frstCs >Ce *e >e 32> 

— — * 4 1 2 s 

can occ frstCs Je »e D 

— — 2 S *4 

can occ frstCs *te >e 3D 

— g AS 

possi bl e_f rst_i fCe^ > [ s^ , s g ] I) 
possi ble_frst_ifCe 2 » C s^]} 
possi bl e_f rst,_if ^ e 9 » £ ^ 

possi ble_frst-_ifCe^» Cs^lZ) 
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possible frst ifCe , [s 3D 
— b a 

The formulation of the above rules require that the each 
boolean relationship representing some cause-effect 
relationship is a function involving disjunction of a set of 
basic events only. This is the simplest ease and it is the 
desired form of representation For handling cases involving 
disjunction and/or conjunction of basic and/or intermediate 
events some transformations are necessary. 

The user , however , enters the fault-tree in the form of a set 
of boolean equations. Two cases have been dealt with 
separately. These are as follows: - 

caseCaD.A set of intermediate events is the cause for another 
intermediate event s and the logical formulation is 
consi stent . 

A predicate causesC intermediate event, set of basic events, set 
of intermediate events!) is defined to mean that the first 
argument is caused by any basic event in the second or by any 
intermediate event in the third. 

The following transformation rules may be applied in this 
ease to transform the formulation to a form which can be 
expressed with the help of the two predicates can_occ_frst and 
possible_frst_if : - 

itflcausesCs ,sE »s£.Z) 3 CV s e sS causesC s ,sE ,ts 333. 

i v v. j »■ <■ L J 

#2causesCs ,sE , [s .3 !> Alcan_occ_frstCs sE 3 

i v J J J 

Ccan oee frstCs , sE, I sE : = sE U sED. 

— — t k 1 k »• J 

The truth table in the simplest form can be obtained by 
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applying transformation rules #1 and *2 to elements of S 
whenever r equi r ed . 

CAfitirC b) • Thi b i sb a more gener al cal& 6 where the bool ea.n ecjua.t/1 on 
involving the ca.uskl relationship between the symptoms and “the 
causes involves di s juncti on and/or conjunction of* i nter medi ate 
and/or basic events. A typical formulation in the clausal form 
may be considered. 



ASS n 


where s, * s, can be a basic event or an intermediate event, 
k l 

In this case the following set of formulations are attempted to 
resolve the conjunctions: - 
Transformation # 1 . 

The aforesaid clausal formulation is to be rewritten as: * 

* 

S, : - S . 
v 

* 

S. : - s . 

i 

V p in Cl, ml s a*. 

kp 

V q in Cl ,nl fi, : - a* 

lq l 
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Transf or mat i on 

Each of the events s^* s^ are to be treated as intermediate 

events. They must have instrumented indicators to determine 

& 

their state uniquely. The events s ,s are to be treated as 

k L 

basic events of Zero equivalent failure rate to avoid traversal 
of closed nodes. Transformation rules #1 & ;*2 cited in caseCaD 
are to be applied now and the truth table in the desired form 
is to be obtained and standard predicates defined may be used. 

It is clearly seen that the conjunctions are very costly from 
instrumentation point of view. 

The transformations &L & ^2 can be applied to any finite 
number conjunctions associated with s . 

EXAMPLE 2. 

s = a. b + c 

s = d 
st 

s = a + e 
a 

this can be represented by an alternative formulation which 
is not exactly logically equivalent but produces approximately 
the same trajectory. 

* 

s = s + c 

* 

a = s 


b = s 


s = d 
2 


s — a + e. 
a 


The basic events originally represented by a and b should be 


treated as i 


nter mediate events » s should be treated as a basic 
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ervent and there should be i nstrumented fault/ indicators for 
both a and b. 

The logical inequivalence lies in the fact that the above 

m 

formulation 1 ndi cates that s cannot, occur if both a amd b do 

not occur, which is quite true, but fails to utilise the fact 
* 

that if s occurs then both a and b must have 

M 

occurred. However » since s is not checked at all » the second 
implication has no signif icance. 

3.2 SEARCH BASED ON DEDUCTIVE REASONING 

Let S be set of all intermediate events which have instrumented 

a 

fault indicators; 

Let S := S - S be set of all intermediate events which do not 
h a 

have instrumented fault indicators; 

It is to be noted that all events in S can be checked by the 
user when required and three answers *yes***no* and * don * t 
know 1 are possible to such a query. 

Let S e S be set of intermediate events which have occurred; 

d CL 

Therefore >S := S - S is set of intermediate events which have 

& CL d 

not occurred. 

The following routines are then used to separate out the set 
of faults which are definitely known to have occurred and the 
set of faults which are definitely known to have not 
occurred. The remaining subset of E is the set of faults about 

f 

which some ambiguity exists. 


Routi ne #1 . 
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V s in S can_oce_frBtCs. »sE.} z> CE.: = E - sE where E is 

t ♦ 1. l. v i-A v v 

the set of possible basic events at the time of processing the 

i intermediate event in £ and it is initialized to E. This 

rule states that the causes corresponding to each intermediate 

event in can be deleted from the set of possible faults. The 

predicates of the form can_occ_f r stC i nter medi ate event * set of 

basic event s2> are to be updated and sE. is to be subtracted 

from all these using the following rule for all elements in 

Vs *e £ » can occ frstCs » sE D o Ccan occ_f rstCs »sE # .2) 
j «* J J J 4 

where sE 1 : = sE - E 
j J «■ 

Routine JS. 

V sin & 3 possible frst ifCe.CslDo CE : = E J + Ce 32> where 

v d k i. d d k 

i L-i 

e c and E is the set of basic events which are definitely 
k d 

L 

known to have occurred at the time of processing the 

i t?l iniern»diaie event in £ and it is initialized to a null 

d 

Serb. 

This rule states, if a particular intermediate event occurs 

and if it can be caused by one particular basic event only, then 

that basic event definitely occurs. 

The set E : = E - CE + E) be the set of faults about which 
b a d 

some ambiguity exists E and E^ being the sets of faults which 
are definitely known to be absent and the set of faults which 
are definitely known to be present respectively and they are 
obtained through routine *1 and #2. Thus, for example 1. if 
£ =Cs ,s 3 and £ s Es 3, then E s c W e 9 3 » E d = Ce .* 3 and E b 

a 4 ft d St a 

= Ee 3 

B 



3.3 FORMATION OF A SET OF MOST PROBABLE FAULTS 


If the »<et E i£ not a null set .then further reduction of E 

b b 

may be possible using probabilistic techniques. The method 
followed in this case is a modification of MYCIN. 

However > the major deviation from MYCIN lies in the fact that 
symptoms in this case provide only positive inferences about 
the occurrence of faults* unlike MYCIN in which the rules are 
framed in such a manner that . particular symptom can add to 
belief or disbelief about a particular 

hypothesis. Accordingly* it computes only the belief measure for 
a particular fault when the set of symptoms dictate that the 
fault is possible. Computing disbelief is not required because 
the method is designed to be used on a set of faults all of 
which are possible for a given set of symptoms. 

V e in E * 

l b 

l.t pC. Z> = 1 . 0/n denoife the apparent, probability of occurrence 
of a particular fault e considering equal probability for all 
faults .n being the number of elements in E^; 

-X T 

let pCe 1 = 1 - e L 1 denote the actual probability of of 

tL CL 

occurrence of basic event e.*X. being the failure rate 

\. \ . 

associated with basic event e^ and T being the corresponding 

time of operation for the components associated with it; 

let pCe JTD denote the normalized probability of occurrence of 
i • 

the event e. if the TOP event T occurs. 
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In the computation of pCe^.we assume the following: - 

Cl} A fault is a permanent state and can only be removed 

through repairs and/or replacements and not by itself. 

C 25 Components associated with a fault get a fresh life once a 
particular fault is rremoved either through repair or through 
component r epl acement . 

The value of the normalized probability pCejT} for a basic 
event e^e E fe , can be computed using the following relation: - 

pCe |T} =pCe^D /C ” £ pCe^} } where m is the number of elements 

k *£ 

in E 

b . 

The values of pCe IT} and pCe D thus obtained are used in the 

iu * l a 

computation of a certainty factor to indicate whether it is 
worth searching for the particular basic event or not. 

The certainty factor is defined in the following way: - 
C. F = 1 if pCe.D = 1; 

t. 

otherwise C. F = pCe^|TD - pCe^}. 

The C.F is computed for. all e^ in E fa basic events with C. F< 0 
are deleted from the set E^. Thus E^:= E^ - represents the 

set of most possible faults >E being the subset of less probable 
faults. E defines the domain of search for the next tier. 

V 

3.4 ELIMINATION BY USER INTERROGATION. 

Let £ define the initial set of intermediate events about 
which no conclusion can be drawn. Elements may be added to it or 



deleted from it in course of the interrogations.lt is 
initialized to the null set. 

Let define the initial set of basic events about which no 
conclusion can be drawn.lt is initialized to a null set. 

Let E define the set of OPEN basic events and it is 

O 

initialized to E^. A basic eventC or more specifically the node 
representing it!) is said to be OPEN if it is not a member of E 

a 

or E or E . 

<£ x 

Let £ define the set of OPEN intermediate events. An 

o 

intermediate event is said to be OPEN if it is not a member of 
S or £ or £ . 

d «r X 

V e in E /the following sequence of operations is performed: ~ 

Ci!) test for occurrence of e : ~ 

L 

Ce in E 2>>Mpossible frst ifCe .sSIOAls * £ |V s in sS). 

L O — — LL J e 1 J L 

While testing the individual s « s£.,the following are to be 
considered: — 

Ca)s is to be tested if s e £ - 
Cb^if occur sCs 2) then £ := £ + Cs3. 

J d d j 

Cc2> if not occur sC s 2) then S := £ + tsl» rule #1 cited i 

section 3.2 is to be applied on s . 

4 

CdD if it cannot be concluded whether occur sC a D or not 

V 

occursCs D then £ : = £ + C s 3 . 

4 X k 4 

CiiD after testing e. . any one of the following is to be done:- 

L 

CaD if occur sCe and possible_frst_if Ce^, s£I) 

£ : = £ + s£ , £ : = £ - s£. ,E := E + Ce 3. 

<1 d in x i- d d v 

tbl) if not oceursCE.D then E : = E + Ce 3. 

i a * 1 
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Cc?if it cannot bo determined whether occur sCE ? or not 

i 

occuroCE.) -then E := E + C# J. 

L XX l 

If after testing all the elements of E , E = [] -then less 

v d 

probable faults are to be considered. For this purpose, E is 
reinitialized to E^ and the sequence of operations formulated 
in Ci? and CiiD are to be performed again. 

3. 5 MERITS AND DEMERITS OF THE STRATEGY. 

It is evident from the above discussions that the strategy 
gives best results in cases where the fault-tree can be 
represented by an OR graph only. AND terms are difficult to 
tackle in this case from instrumentation point of 
view. Moreover , if any boolean equation describing the causal 
relationship between a symptom and its causes contains AND 
term on its RHS.then instrumentation is required for all the 
variables Celt her basic or intermediate events? that are 
involved in that term. Furthermore, the strategy fails to use the 
knowledge represented by unary negation. This is because of the 
fact that unary negation cannot be represented by AND or OR 
functions.In f act , representation of unary negation in an AND-OR 
graph requires an additional operator. However, for MYCIN like 
diagnosis problems, the fault trees encountered can be 
represented by a graph having OR links only, each link being 
directed from the effect to the cause. Negations and 
conjunctions are rarely encountered except for systems like 
purely digital systems. Even for such systems,MYCIM like set of 
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rules can describe the cause-effect relationship between 
symptoms and the causes quite effectively but' representing 
negation requires concept of "dis-belief* also. Typically. a 
combination of symptomCs} point to a set of faults each with a 
certain probabil ity. This method is therefore usable for systems 
whose fault-tree can be represented by simple MYCIN like rules. 

The formulation suggested in section 3.1 for dealing with the 
AMD entries > though not exactly logically equi val ent * does not 
give rise to logical inconsistency and does not search for the 
occurrence of a CLOSED node. For dealing with highly formal 
systems > e. g diagnosis of "stuck at 0" or "stuck at 1" 
faults .probabilistic techniques are not required and faults can 
be detected using deductive reasoning only. 
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CHAPTER FOUR 

CASE STUDY ONE - EMSR SYSTEM OF CANDU REACTORS 

The reactor shut down system for CANDU reactors consists of two 
systems viz. the Electromechanical Shutdown RodC EMSRD and the 
Liquid Poison RodC LPR j shutdown system. In this chaptfer * the 
fault-tree of the CANDU EMSR system is presented and the expected 
performance of the diagnosis algorithm for diagnosing faults in the 
aforesaid system has been put forward. 

4.1 FAULT TREE FOR EMSR SYSTEM. 

The EMSR system for CANDU reactors has been discussed in Chapter- 
two. Fig. 4. 1 is the schematic r epresentati on of a single electro 
mechanical shutdown rod. system. The EMSR system is said to operate 
successfully if twelve out of fourteen rods reach the end support 
SP in the guide tube within two seconds and rest on it. Because rods 
are triggered in pairs by seven scram signal circuits > failure of 
two or more instrument channels of the seven or failure of three 
out of fourteen rods to insert completely in the core without 
damaging the support is considered as EMSR system failure. 

Therefore. failure of EMSR System s CFailure of two or more of the 
instrument channel si) OR CFailure in safe insertion of three or more 
out of fourteen rodsD . 

The TOP event for the fault— tree of EMSR is defined as. 

FAILURE OF EMSR TO FALL TO THE BOTTOM IN TWO SECONDS AND REST ON 
THE END SUPPORT SP. 

The fault -tree for a single EMSR is shown in fig.A.IFor 
diagnosis purposes using probabilistic techniques.it is necessary 
to have the failure probabilities associated with the basic 
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events. For the present case, the failure rate data have been taken 
from an earlier analysisH which depends largely on 
WASH-1400. Failure rate corresponding to each basic event is 
tabulated in table 4.1. The AND - OR graph corresponding to the 
fault-tree is shown in fig. 4. 2. 





Failure of EMSR to fall to the bottom in 2 sac 



f ault-tree for Elactromachanical Shutdown Rod failura 
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Table 4. 1 

Data on failure rate or unavailability of Basic Events of 
fault-tree of Electomechanical Shutdown Rod. 


Event 

Description 

Failure rate 

el 

shaft 2 fails to rotate 

0 .609 

e2 

gears G£ & G3 locked 

O. IS 

e3 

gears G1 & G£ locked 

O. IS 

e3 

shaft 3 fails to rotate 

O. 609 

e5 

rope is stuck in a groove 

not. known 

e6 

shaft 1 fails to rotate 

0. 609 

e7 

shaft 4 fails to rotate 

0. 609 

e8 

shaft 5 fails to rotate 

0. 609 

eS 

instrument channel fails 

1.612 

elO 

shift in instrument channel 

48. 33 


calibration 


ell 

i 

spring SRC a pai rD fails 

O. 42 


to release 


el 2 

accelerator spring fails 

O. 0241 


to release 


el 3 

support SP breaks 

0.024 

el 4 

rod fails to reach 

not known 


support in £ sec. 


el 5 

testing & mint, of 

1.2 x 10 5 


i ns t r ument channel s 


el 6 

testing of mech. systems 

O. 001 

el 7 

maint. of mech. systems 

2. 138 x 10 -B 
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notes - 

C13. Failure rate is expressed as a mean per lO^hr. 

C23.No failure rate data is available for events e and e . 

S 14 

4.2 ANNOTATIONS PERTAINING TO AND-OR TREE FOR THE CANDU EMSR. 

T -4 Top Event ; 

s ,s .s and s are the intermediate events. 

12 9 4 

Following is semanti c si gni f i cance of “the? symbols associated 

with the intermediate events: - 
-* Rod fails to fall; 
s^ -» Scram signal is not received 
s Instrument channel shift in operation. 

9 

s Rod fails to fall in 2 sec. 

-4 

All of them are assumed to have instrumented fault indicators. 

The basic events e and e have not been included in the AND-OR 

lef 17 

tree. 

There is no failure rate data corresponding to the basic events 
e and e .To ensure detection of e and e > X = X = X where 

15 4,4 5 14 5 14 mtaix 

X is the failure rate cor respond! ng to the event with the 

TWOLX 

highest failure rate in the set of all possible events that can be 
considered for probabilistic search. Owing to the extremely high 
failure rate value associated with e^ which is several orders of 
magnitude larger compared to others * i t has been instrumented and 
has been linked to an intermediate event viz. s^. It is thus evident 
that if s is tested > then e will not be tested, e^ will 

9 lO 

therefore., never be considered for probabilistic 

search. Accord! ngly.X = 1. 616/1 O* hr which is equal to X^ It is 

TOOLX 

to be noted that e^ will not be considered for probabilistic 
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search since its stAiio* can ba datar* mi nad by axami ni ng bha ciai# of 
s 

4 

. 3. PERFORMANCE ANALYSIS OF THE DIAGNOSIS ALGORITHM FOR THE PRESENT 
CASE. 

From the AND-OR tree it is evident that only discrete 
combinations of intermediate events is possible. Corresponding to 
each of these combinations » two indices M and M* are defined. M 
represents the number of basic events to be considered in the worst 
case when even low probability faults are to be considered. M* 
represents the number of basic events comprising the set of most 
probable faults. This is equal to the number of searches when at 
least one fault in the set of all possible faults is found to have 
occurred.lt is to be noted that M represents the number of elements 
in the set of possible faults after the first tier of search and M' 
represents the number of elements in the set of most probable 
faults. 

The indices M and M* are based on the following assumptions:- 
CiDTime of operation associated with all faults is the same. 

Cii2>X T is a small quantity > where X is the equivalent failure 
l i v 

rate associated with the £ fault and is the time of operation 
associated with the fault. 

The valid combinations of occurrence of the i nter medi ate events 
and corresponding values of M and M* are provided in Table 4. £. The 
formulation put- forward in Table -4.2 is valid for t.he first- 
occurrence of the TOP event. Thereafter, the number of searches 
required will be dictated by the faults themselves. But .if the 

CENTRA 

1ST ' ' ' 

•— -oYTTr' 



proba.bilist.ic nature of the faults is assumed to hold then the 
average worst case will not be much different from this. 


TABLE -4.2 

Table of valid combinations of intermediate events for 
the fault-tree of CANDU EMSR and corresponding values of 
performance indices. 


SYMPTOMS 

M 

M' 

s s 

& 2 

S 

9 

S 

4 



0 0 

0 

0 

0 

0 

0 0 

0 

1 

1 

1 

1 o 

0 

0 

11 

6 

1 0 

0 

1 

11 

6 

1 1 

0 

0 

11 

6 

1 1 

0 

1 

11 

6 

1 1 

l 

0 

11 

6 

1 1 

' 

i 

1 

12 

7 

. 
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CHAPTER FIVE 

CASE STUDY TWO- TROUBLE SHOOTING IN IBM PC 

The di agnosis s*Lr ategy devel oped f or t aul t di acnosi s has been 
applied for guided trouble-shooting in an IBM PC and can guide 
tests to track component level faults. The knowledge base covers 
double drive IBM PCs though some aspects of the IBM PC XTs are also 
covered by it. Fault trees were constructed for problems related to 
CiD Star ting up 
C i i D Readi ng/Wr i ti ng 
Ciii2> Keyboard problems 

However * the fault trees in none of the cases incorporate human 
errors and errors with coded diagnostic messages which are provided 
by the built in diagnosis software of the PC itself. Another 
important aspect of these fault trees is the fact that data is not 
available for a large number of basic events * but almost all of them 
can be tested to determine their state. The modularisation of the PC 
faults into separate modules makes trouble-shooting easier by 
selecting the appropriate problem areaCe. g Starting 

up* Readi ng/Wri ting, and Keyboard problems D but at the cost of some 


i nt er r el at ed i nf or mat i on . 
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5.1 DATA TREATMENT FOR TROUBLE SHOOTING THE IBM PC. 


A dat abase of" f 1 ur e r ales of ma j or electrical and el ec t r oni c 
components has been prepared. Failure rate associated with the fault 
is then computed from the failure rates of the associated 
components . The data is obtained from [£]. 

For any system,, if any one family of parts has a failure rate of 
X then the system failure rate will be 


In 

and 


X 

the above expression n^ 
n is the total number 


= E 

0 - * J J 

is the total number of such families 


of components of a family which 


J 

jeopardize the system.lt is evident that the value of n^ depends on 
system configuration. More accurate the value of more expertly 

the system will diagnose the faults. 

Setting n = 1 Vj*we get the failure rate corresponding to a 


J 

system comprising of one component of each kind. This has an 
advantage as well as a disadvantage. The advantage lies in the fact 
that the system description is very simple and it assumes no 
knowledge about the system conf iguration. But at the same time much 
of its power to tell apart more probable faults from the less 
probable ones is lost.Thtis a diagnosis based on this simple model 
will fail to distinguish a fault associated with a single resistor 



and a fault, involving a collection of resistors on the basis of 
probabilities only unless the number of components is specified. The 
interface software designed to build the diagnosis knowledge base 
prompts the user for the number of components and assumes it to be 
unity by default. 

For the present case, diagnosis is done mostly by setting 

n equal to unity as most of the faults are associated with 
J 

individual components at particular locations only. Moreover , during 
the normal operation of a PC the "RAM ics are used much 
more. Therefore, failure rates associated with RAM ics is multiplied 
by a f actor CIO in the present case) to qualitatively tell apart 
failure of a RAM ic as a more probable fault than others. 

Although, fault -trees have been constructed for all the four 
problem areas mentioned at the beginning of this chapter , anal ysi s 
is done for start up related problems only. The sets of Boolean 
equations describing the cause-effect relationships for other- 
problems are also presented. The analysis technique is similar to 
that for start up related problems and therefore is excluded for 
the sake of brevity. 

For building up the cause effect relationship,some diagnosis aids 


avi 1 abl e 


were also considered. 



5.2 FAULT TREE FOR START UP RELATED PROBLEMS IN THE IBM PC AND ITS 


TREATMENT 

The fault tree for atari up related problems in the IBM PC is shown 

in fig. 5.1 and the corresponding AND— OR search tree is presented in 

fig. 5. 2. Table 5.1 presents the failure probabilities associated 

with each basic event. The failure rates represent mean failure 
s 

raift/lO hr s . 

5.3 ANNOTATIONS PERTAINING TO THE AND-OR SEARCH TREE 

T -fr Top Event, = Failure to turn on and boot up when power is 
switched on. 

s t s »s are the intermediate events. 

4 a a 

& 4 Power light off and no response. 

s Power light is on* but nothing on the screen. 

s Power light is on* but drive will not boot a disk. 

m 

Failure rate data is not available for e * e * e and e . Therefore 
X t X t X and X are each assigned an equivalent failure rate equal 

4 ? fi " 

to X^ following the same line of logic presented in Chapter four. 

All other assumptions remain the same as those formulated in 
Chapter four . 



Wsk±lvar‘& to 








TABLE 5.1 


Data on equivalent failure rate associated with each Basic Event 
In the fault-tree for start up related problems in IBM PC. 


Event 

Description 

Failure rate 

el 

Power supply failure 

0. 1564 

e£ 

Bad 3284 

0. 0703 

e3 

Bad 8088 

O. 0703 

e4 

Bad X-tal 

not known 

e5 

Bad Bom 

0. 0703 

e6 

Bad disk 

not known 

e7 

Bad data or disk 

not known 


dislodged 


e8 

cable bad or loose 

not known 

eS 

Bad 74s38 at If 

O. 0025 

elO 

Bad resistor at Be 

0. 0025 

ell 

Bad 741 s86 at 5d 

0.0703 

el 2 

Bad 741 s74 at Be 

0. 0703 

el 3 

Bad 311 at 5b 

0. 0025 

el 4 

Bad 5S2 at 4a 

0. 0025 • 

el 5 

Bad 733 at 3a 

0. 0025 

el 6 

Bad 741s240 at ul8 

0. 0703 
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5.4, PERFORMANCE ANALYSIS OF THE DIAGNOSIS STRATEGY. 

The valid sets of symptoms and the values of the indices M and 
M # corresponding to each combination is presented in table 5. £ 

The definitions of these indices remain the same as that put 
forward in Chapter f our . Assuming * predicted nature of occurrence of 
faults holds good * the average worst case value of M # will not 
deviate much from the value shown. This formulation* again* is for the 
first occurrence of any fault only. But assuming that probabilistic 
nature of occurrence of faults holds good* the value of M' for an 
average case will not deviate much from this. 



TABLE 5.2 


Valid Bets of symptoms and corresponding values performance indices 
for start up related problems in the IBM PC. 



Notes- If the symptom s is seen then none of s and s can be 

a a a 

detected at ih* came ii me. However * as seen from “the AND-OR tree, 

the symptoms s and s can occur i independent ly. Thi s is a 

2 fi 

consequence of the inability to represent negation. The above 
analysis is based on the AND- OR graph only. 
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5.5 BOOLEAN FORMULATION OF CAUSE-EFFECT RELATIONSHIPS FOR 
FAULTS RELATED TO DRIVE READ/WRITE FAILURES IN IBM PC. 

T 4 Failure of one or both the drives to read and/or to write. 
The TOP event is related to the intermediate events by the 
boolean equation T » s 4s 4s 4s 4s 4s 

1 2 S 4 5 <5 

s ,s »s & s ar«? the i nter medi ate events and the annotations 

* ft ft 4 E <$ 

e as f ol 1 OWS : — 

s a particular drive does not read. 

s *4 none of the drives read. 

2 

s a particular drive reads but does not write. 
a 

s -» both drives read but none of them write. 

4 

s drives cannot be accessed. 

E 

s Computer locks. 

<5 

The relationship between each of the intermediate events and the 
corraspondi ng causes are presented in the following set of boolean 
equations. All the terms appearing on the RHS of these equations are 
basic events and they are annotated separately 

s = eel + ee2 + ee3 + fl + a3 + a4 + b5 + c5 + d5 

1 

s = eel + ul8 + ulQ + u20 + u21 + u2£ + u23 + u24 + u2B + u26 

2 

% = + ee4 + eeB ■+■ e2 + bS + c5 

a 

s = eel + ee4 + uQ + ul 1 

4 



63 


s = eeS + ul2 + ul4 + ul6 + ul7 + u26 

B 

s * ee6 + ee7 + u3 + u6 
<s 

The annotations of the basic events are as follows: - 
eel cable bad or loose. 

ee2 •* bad head or misalignment. 
ee3 -+ bad or unformatted disk. 
ee4 write-protect switch is bad. 
ee5 analog connectors are corroded 
ee6 -»■ keyboard failure. 
ee7 -*■ bad RAM chip, 
fl bad 74s38 at If 
e5 bad 221 at 5e 

d5 bad 741s 386 at 5d 

c5 -► bad 741 s74 at 5c 
b5 * bad 311 at 5b. 

a4 -» bad 562 at 4a. 

a3 bad 733 at 3a. 

ul8 bad 741 s240 at ul8 
ul9 ^ bad 741 si SI at «19 
u20 bad MC4024 at u20 

u21 •+ bad MC4024 at u21 
u22 bad 741 si 12 at u22 
u23 .* bad 741 si 61 at u23 



64 

u25 + bad 741 si 12 at u25 
u26 -♦ bad 741 s02 at u26 
e2 -► bad 741 si 4 at 2e 
b2 -♦ bad 741 sO© at 2b 
ull *♦ bad 741 si 75 at ull 
uG -4 bad 7438 at u3 
u!7 -f bad 741 s273 at ul7 
ul2 -♦ bad 741 so4 at ul2 
u2S ■* bad 741 s02 at u2G 
u!4 -4 bad 741 s08 at ul4 
ul6 -4 bad 7438 at ul 6 
u6 -♦ bad 8288 at u6 
u3 -♦ bad 8088 at u3 

5.6 BOOLEAN FORMULATION OF CAUSE-EFFECT RELATIONSHIPS FOR FAULTS 
RELATED TO KEYBOARD PRC LEMS IN THE IBM PC. 

T TOP event = Key board error occurs 
The Top event is related to the intermediate events s , s »s »s 

^ i. 2 9 4 

and s by the bool ean equati on T ■ s 4s 4s 4s 4s. The 
annoiaii one for s »s » s »s and s are as follows: - 

± ft fi * 5 

Signal not reaching the mother board. 

No character is being generated. 
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B No signal getting to the data bus. 

B 

c ^ Keyboard does not respond or prints bad characters. 

4 

s ^ Keyboard stays in upper or lower case. 

Xhe relationship between the intermediate events and the 
corresponding events causing them is represented by the 
following set of boolean equati ons . A1 1 symbols other than those 
used to represent the aforesaid intermediate events are basic 

events. 


s = eel 

s = ee£ 

2 

s = ee3 + u34 + u36 


& 

£ = s + s + s 

4 1 8 fi 

& = ee4 + ee5 

Er 

The annoiAtion for the basic events is as follows:- 
eel •+ bad or loose cable. 
ee2 -► bad video circuitry. 

ee3 -*• bad 8048 located inside keyboard 


ee4 sticky key. 

ee5 -+ bad Caps lock key. 

u24 ^ bad 741s322 at u24 on system board 
u36 bad 8255 at u36 on system board 



Note* - 


65 


CiD For the last two cases, the codina of ^ 

9 or symbols for basic events 

has b«?en done in such 

y that the >' to the location of the 

fault. This Is for ease of comprehension. 

C i i 5 The locational details ub*h 

used for diagnostic messages are 

valid for IBM PC circuit boards only. 

CiiilFor Keyboard related problems . internal faults in the Keyboard 
has not been considered as the Keyboard is a third party product 
for an IBM PC and hence there are different types of Keyboards 


available. 
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CHAPTER SIX 
CONCLUSIONS 

The Diagnosis strategy has been tested using the Fault-trees for 
the CANDU EMSR and f or START UP related problems in an IBM 
personal computer. For testing the EMSR system, the simulator has 
been used and for testing START UP problems, an IBM PC/XT with known 
fault has been used. 

6.1 RESULTS OF CASE STUDY- ONE AND DISCUSSION. 

Faults were simulated using the simulator developed for 
electromechanical systems and using the principles of simulation 
described in Chapter two. In computing the values associated with 
the intermediate events, a single EMSR is considered and for 
computing the neutron population at different instants the entire 
EMSR system has been considered. For a 225 MWe CANDU nuclear- 
reactor , fig. 6. 1 and 6. £ depict the change in observed values from 
the normal to the faulty state when the fault eCshaft 2 locked in 
bearings — fig. 4. II) occurs at t — 1 sec after the scram signal has 
been received. The time base for this plot is provided by the 
simulator clock and while computing the neutron population, the 
outage of a single rod is considered at t =1 see. It is to be 
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remembered that ►outage of a single EMSfc does not produce a failure 
for the entire system and hence the plots only serve to show 
qualitatively the difference between normal and faulted states. 

In order to test the effectiveness of the diagnosis strategy a 
series of faults which produce the same combination of symptoms 
were simulated and diagnosis was made. In all these cases.only a 
single EMSR has been considered. Table 6.1 shows a listing of faults 
si mul ated > the time at which they occur red* the symptoms visible and 
the number of queries made by the user for the status of the basic 
events considered. The time here refers to the age of the system. 

TABLE 6.1 

Performance of diagnosis strategy for CANDU EMSR 


Event • 

Timed hr . 2) 

Symptoms 

Number 




of queries 

el 

10000 

s only 

4 

6 

e«4 

i 

aooo 

Si only 

i 

S 

el 

— «i 

3000 

only 

5 

c 


The basic events tested in the first case were el ,e4*e5,e6*e7 and 
e8. The basic events tested in the second case were 
e4 ► eB ► e6 , e7 * eS. The basic events tested in the third case were 


el ► e5 ► e6 ► e7 and e8. 
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It i= thus seen that the formulation in chapter four holds good for 
the present system and the search trajectory conforms to expected 
ones . I t is evident that the strategy is intelligent enough to 
select the most probable of all the possible faults and also uses 
the fact that once a fault has occur red, the probability of 
occurrence of the fault in near future is actually much lower 
assuming that the fault is repaired once it is detected. Thus, though 
the symptoms point to event el in the first case, as well as in the 
second, event el is not tested in the second case after its 
detection in the first. 

The symptom table gives a listing of the intermediate events 
which have been detected. A further assumption in each case is that 
TOP event has occurred, which is off course quite obvious. 


For a description of the events el and e4 and the intermediate 
event s^, the annotations pertaining to the AND - OR tree of fig 
4.3 is to be considered. 

6. 3. RESULTS OF CASE STUDY-TWO AND DISCUSSION; 

For testing the diagnosis strategy for the problems related to 
IBM personal computer, a IBM compatible PC/XT was chosen and only 
start up problems were considered The PC/XT selected was one which 
had a problem with power supply only. The reason behind doing this is 
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th<f f act f Ault-ir#® for iK® siftri up problftws hA.B btircn 

dftVftl opftd for a double drive IBM PC And Bine#? the mother 
boards for both IBM PC fend PC/XT are identical » a large subset, of 
common defects and corresponding diagnosis messages exists. The 
fault, was restricted to power supply problems* because of the 
fact that most of the diagnostic messages involved component 
location level details for the IBM PC whereas the computers 
available for testing were only IBM compatible ones with different 
locational demarcations for the components on the respective 
boards. 

Table 6 .2 is a formulation showing the symptoms visible for the 
fault considered and the failure considered and the number of 
queries The symbols used conform to the fault -tree presented in 
Chapter f i ve . 

TABLE 6.2 

Performance of diagnosis strategy for Start up problems of the IBM 
PC. 


Event 

Symptoms 

_ 

Number 



of queries. 

e 

s 

0 


& 

1 
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Note» — 

e Power supply failure, 
s *♦ Power light on and no response. 

It is to be noted that fault -trees were also considered for other 
problems associated with the IBM personal computer viz. problems 
related to read/write failures .keyboard problems for the IBM 
personal computer . However . since most of these are third party 
products, the locational details of the components on their ~ards 
vary greatly but the fault-tree remains more or less the same. So 
the same database can be used for other systems by altering the 
diagnostic messages. Further more, detection of component level faults 
requires using the developed diagnosis strategy requires testing of 
faulty components during the search and locational details are 
very useful for this purpose. 

6. 3. SUGGESTED METHODS OF IMPROVEMENT AND SCOPE FOR FURTHER WORK. 

The principal difficulty encountered in applying the diagnosis 
strategy in certain cases is the lack of proper reliability 
data. For exampl e , whi 1 e applying diagnosis strategy for fault 
diagnosis in electronic systems.the interface asks only for the 
components associated with a particular fault and computes the 
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equivalent failure rate using this in an oversimplified manner 
assuming one component for each family in the composition of the 
subsystem associated with the fault unless otherwise stated by the 
user. Rarely is failure rate data available for all the basic events 
associated with any fault -tree. 

This problem can be tackled in two ways. In the first ease, the 
reliability data can be obtained by rigorously computing the 
failure rate associated with each basic event and then entering the 
failure rate directly instead of the present way of representation 
of the reliability data. For this purpose synthetic tree model 
as followed in the fault —tree construction code DRAFTM can be 
applied and and component failure transfer functions may be used. 

Another alternative is knowledge modification using learning 
methods. Heuristi cal ly» one can argue that if one particular 
component fails often and it is not tested while considering the 
most probable faults, then obviously its probability of occurrence 
is under prc jected and if it fails less often but is considered as 
a probable fault often, then its probability of occurrence is over 
projected. Thus individual a priori probability for all basic events 
Eg E may be assumed to have a Coefficient C^ which can be adjusted 
suitably e. g if the faulb occurs often but it is not included as 
in the set of probable faults, then the value of the Coefficient 
associated with its a pr i cr i probability should be incremented and 
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vice versa. This approach has been followed in the case of Chess 
playing programs. The suitable equilibrium values and equilibrium 
states will involve quantification of the search and once 
equilibrium is reached, further modification of the value of a 
particular component may be stopped. 

Extension of the present work may be done in the following 
directions involving diagnosis on both simulated and actual 
systems: - 

CiD Modification of the search strategy for tackling AND terms. 

Cii!) Development of a simulator for periodic forcing functions based 
on the present formulations. 

Ciii!) Development of a parallel computer simulation algorithm for 
systems involving both feedback and feed-forward links. 

Civ!) Development of a multi computer on line supervisory system 
for on line fault monitoring in large systems. 

The present search strategy cannot tackle the AND terms 
efficiently. Modification is contemplated in this direction. And 
terms will require some extra predicates for their representation. 

It is evident that if each 's' in the transfer function of a 
system is replaced by ' jv>‘ , then the frequency domain transfer 
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function is obtainedC assuming purely sinusoidal functions!) . Treating 
each transmittance term as a complex quantity rather than a real 
number, the present simulation algorithm can be extended to 
sinusoidal periodic forcing functions of known frequencies. 

Since feedback and feed -for ward path computations can be done 
independently, a parallel algorithm can be formulated for fast 
simulation in cases where the system transmi ttances are time 
variant ones. 

For detection of faults in a large system, fast on-line monitoring 
can be done by using deductive and probabilistic analysis by a 
number of computers sharing information each being entrusted with a 
particular job. 



normal 
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ERRATA 

1 page 3 , Sect! on 1.2,lin® 4 - * exist* should be read as ‘exists* 

2. Page 4,Sectior. 1.3. last line - ‘fault tree‘ should be read as 
•AND-OR search tree* 

3 Page 7, line 6 - ‘dedicate* should be read as ‘dedicated*. 

4. Page 11. line 2- *XCO = U^CtV should be read as *XCt)«U_a3 * 
q. Page 13. line O - *P* should be read as *P # * 

6 Page IB. line 1 - ‘associated at* should be read as * associ ated 

wlt,h* - 

7. Page 22. section 2. 4. line 16-* fourth node represents* should be 
read as ‘fifth and sixth represent*. 

8. Page 23. line 10- ‘cot rollers* should be read as ‘controllers*. 

0. Page 26. section 2 . B. line 20 - *this* should be read as ‘these*. 

10 Page 35. line 10 - should be read as £ k . 
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APPENDIX - I 

DEVELOPMENT OF CERTAINTY FACTOR FROM ORIGINAL MYCIN MODEL. 

According to MYCIN Ca medical diagnosis program^ a hypothesis 
that, a particular disease has occurred when certain evidence is 
seen is associated with two measures, a measure of belief and a 
measure of disbelief denoted by MB[h,e3 and MD£h,e3 respectively 
where ’ e ’ is any evidence and * h * is any hypothesis which can be 
drawn from *e* with a certain probability. The belief and disbelief 
measures are defined as: - 

MBCh.e] = 1 if PChl) = 1 

maxC PC h j e} , PC hD 3 - PC hi 

else MBth,e3 = 

maxt 1 , 03 - PChD 
and MDth,e3 = 1 if PChD = O 

mi n£ PC h | e3 » PC hU 3 - PChD 

else MDCh,e3 ■ 

min£ 1 ,03 - PChZ> 

where PCh|eD is the probability that the hypothesis *h* is true 
gi ven evi dence * e * . 

The first term states that if the a priori probability that *h* 
is ICin which case it is known to occur 5 or the evidences found in 
its favor indicate a probability higher than its a priori 
probability, then there is a reason to believe that the hypothesis 
*h* is true. 

The second term states that if *h* is known never to occur or if 
the evidence found in its favor indicate a probability lower than 



the & priori probability of *h’, then ih«re is reason to believe 
that *h* is not true. 

Thus if PChjeD > PChZ> then MBCh,e3 > 0 & MD[h,e3 = O. Likewise, if 
PCh|e3> < PCh> , then MBth.e] = 0 & MD[h,e] > 0. 

On the basis of these measures of belief and disbelief, a 
certainty faetorCC.FD is defined as follows: - 

C.F = MBL h, e3 - MDtK,e] 

Thus, C. F > 0 if PC h | e} > PChO; 

C.F < 0 if PC h | e} < PCh!>; 

C.F ■ 0 if PC h j e!> = PCh!>. 

If the C.F for a particular hypothesis is greater than O, then the 

hypothesis is accepted according to the MYCIN model. 

For the present case, .since negations are not involved, a 
particular symptom can only point to a fault with a certain 
probability only and as such only the first termCi.e MB [ h > e 3 D need 
be computed. The measure of disbelief need not be computed in this 
case and eliminations are based on deductive reasoning only. 

The term PC hD in the original MYCIN model denotes the a priori 
probability of the hypothesis to be true and there is a qualitative 
relationship between the a priori probability of a particular 
hypothesis to be true and the probability of it to be true as 
suggested by the evidences in the sense that a particular 
hypothesis having a higher a priori probability associated with it 



will hav* a sizable probability suggested by the evi denees » when it 
occurs. However » in the present case » for a set of symptoms#, a fault, 
may only be considered as a probable one or its state may be 
deduced with certainty and the symptoms neither have any 
quantitative nor any qualitative relationship with the actual a 
priori probability of occurrence of the fault. They merely define 
the fault as a member or a non member of the set of probable 

faults. The term PC hi) in the original expression for measure of 
belief is replaced by a probability PCeJ> = 1/n where n is the 

number of elements in the set of probable faults obtained after the 
first tier of search based on deductive reasoning. The probability 
term PCe D * therefore assumes that all the faults in the aforesaid 
faults are equally probable and denotes an average value. 

The term PC h j eZ> can be replaced by the term PCe^jTD to indicate 
the probability of occurrence of a faultCi.e a basic event) e. 

relative to other basic events in the set of probable faults E^when 
the TOP event occurs » Cnot known definitely!) when the TOP event 
occurs and is accompanied by the occurrence of a set of 

symptoms .The term PCTjeJ) is computed in the following manner. 

Let PCe!) =1 - expC -XT) denote the actual probability of 

1 OL L t 

occurrence of a fault where X denotes the equivalent failure rate 
associated with the faultC basic events e. and Tic the time of 

operation for the components assuming the components to be fresh.lt 
is to be noted that once a fault is detected and the faulty 
components are replaced or repai red > they are supposed to start a 
fresh life. 

After computing the value of PCe. 3 » the next step is to see how 
it compares with the corresponding probability values for the other 
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probabl* fa.ultfi.Thi a iss achieved through normalization and PCejT) 
than indioatas tha probability valua aasooiatad with tha fault a 

i 

rfel&iive to others in the set of" probable faults . Normal i zed 
probability PCe.J’D can be obtained using the relation below: - 

n 

PCe. I'D = PCe.2> / C T PCe ^ :> 

J «* 

and the corresponding certainty factor is defined as 

C.FCe I'D = PCe I'D - PCe ) 

i 1 i 1 i 

All faults with C. F £ O are classed as more probable faults and 
faults with C. F < O are less probable faults.lt is evident that the 
more probable faults have a relative probability more than the 
average value and the less probable faults have a value less than 
the average. 

From the discussions made so far two things become evident. These 
are: - 

Sr 

CD In a set of all probable faults no matter what the individual a 
priori probability values for the probable faults be* only the most 
probable ones amongst those are selected. 

CiiD As the time of operation associated with a fault 
i ncr eases * wi thout the f aul t occur r i ng * the a pr i or i pr obabi 1 i ty of 
that fault increases. This i ndi cates * that for the same set of 
symptoms * the certainty factor associated with the fault increases 
with time. In other words* the fault becomes increasingly likely for 
the same set of symptoms. 
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RUNNING THE SOFTWARE 

A. 2.1. RUNNING THE SIMULATOR. 

To run the simulator one has to run the SIMUL.EXE file. This is 
the stand alone version of the simulator software and can be used 
to simulate electromechanical systems.lt is a menu— driven software 
and the main menu permits the following modes: — 

C13. Build 
C 23. Edit 
C 33. Run 
C 43. Fault 
C53 . Quit 

Selection in the main— menu is done by hitting the space bar C this 
places the selector at the appropriate ehoice3 and pressing the 
letter * s * or ’S’. Pressing the space-bar after the selector is 
placed at the 5** 1 menu itemdi.e Quit3 resets the position of the 
selector to the position of the first menu item i.e Build. 

Each item in the main menu has a sub-menu associated with it. The 

I 

Build sub-menu consists of the following items: - 

Ca3 Build a new system 
Cb3 Import an already created system 
Using the first option* an electromechanical system can be 
built. The simulator provides an extremely user friendly interface 
£or building a system. As soon as the Build option is selected* four 





windows appear on the ccrasn. The 1 argeci of ihu« four is the 
canvas. The extreme right window shows the menu of the forward path 
component sC ser i esD that can be simulated. The bottom-most window is 
the dialogue window and it guides the user during the building 
process.lt also takes inputs from the user in the graphics modeCi.e 
during bui 1 di ngD . The selection of an appropriate location on the 
canvas is done by moving the location selectorCa small white 
circle!) in the canvas. The selector movement commands are, ' h* for 
moving left,*j* for moving downwards* *k * for moving upwards and *1* 
for moving right. To build a particular component , one has to take the 
locator to an appropriate location on the canvas, press any key 
other than the aforesaid four to inhibit the locator movement 
routines and then press the appropriate key as directed by the 
menu window. This builds the sub graph corresponding to the item 
chosen. The user is prompted to type in the values of the required 
parametersClike armature resistance for a motor or gear ratio for 
a gear D by the dialogue window.lt is to be noted that in the 
building mode, all the parametersCbe real or integer!) are read in 
as a 8 character string and is converted to an integer or real as 
required by the particular case. The next step is numbering of the 
items built. This process establishes the connectivity between the 
individual sub graphs and builds the signal flow graph for the 
above system. This is again, to be fed in by the user through the 
dialogue window and each is a two character string. The numbering 
routine automatically indicates the number of which component it 
wants . 

Tbe build routine then asks the user whether 
feed-back /feed-forward paths need be built. The answer is *y* for 



yers and *n f for no. It, may be mentioned here “that, all the feed— back 
and feed-forward paths within any particular series path component 
are tackled when that component is built. If" the user responds with 
an *yes* to the above query* the menu window shows him which key to 
press for beginning a feed-back xTeed -for ward path and which key to 
press to terminate it. To initiate a feed back from one particular 
component * the user has to take the locator to the appropriate 
component * pi ace it in within the circular zone marked there* i nhi bit 
the locator movement and press *b* or *B*.To terminate the 
feed-back he has to take the locator to the appropriate component 
and follow it up with identical actions except that he has to press 
the key *s* or to select the termination point. During the 

process of sel ecti onC both during feeding in and feeding out!) the 
message windowC right down!) shows which component is being selected 
and the dialogue window states what can be fed out or fed in. If a 
valid feed-back or feed forward path is formed* the dialogue window 
asks for the value of the path transmittance associated with 
it. Invalid paths are automatically taken care of. At this stage 
after entering the path transmittance if the user wishes to cpme 
out of this path building routine he simply press to terminate 

the process* anything else returns the control to the path builder 
routine. This routine is followed by two other routines which are 
invoked if the user so desires viz. the routine for building timer 
relays and associating it with contactors and the routine involving 
the nuclear interface for the simulator . All these routines are 
guided by detailed instructions through the dialogue window and is 
therefore left out for the sake of brevity. The locator movement and 
the selection routines remain the same in all cases. 



The most, important, part of the building process is the routine 
involving templating. The user is first asked to enter the name of 
the file where this template details will be stored. The user has to 
start first with the intermediate events and select the appropriate 
channel C a particular node in the graph} and mention whether he 
wants instrumentation for that or not. If he does require an 
instrumentation* it is recorded * compared with the normal value and 
messages are sent. The user has to repeat the same process for the 
basic events and for the TOP event.lt may be noted that for 
ascertaining the occurrence of the an intermediate event nodal 
values are cobsi dered and for ascertaining the occurrence of basic 
events* the corresponding branch transmi ttances are considered. 

After the construction part is over* the user is asked if he 
wants to save the system. If he does* he has to enter the name of 
the dosfileCwith full drive and directory specification) without 
any extension and the system details are stored there. 

The build option also permits the user to import an already build 
system in the work area. In order to import the system* the user 
again has to type in the filename in the same way as he does while 
saving. 

The Edit option of the main menu permits the user to edit an 
imported system. This option can be used to edit series 
component * feed-back ✓Teed-f or ward components * timer relays and 
attributes associated with the nuclear interface. The process is 
done in the text mode and the user is provided with instructions at 
every stage. The principal edit codes available are *a* meaning 
append /d 1 meani ng del et e * * c * mean! ng change and * q * meani ng qui t 
edit mode for that particular group of components. 



The Run option can be selected f or a normal case and also for a 
an abnormal case. For normal case* the Run option simulates the 
system responseC after 1 i near izati on * when required} displays the 
results of* some selected nodesCto be selected by the user}. It also 
keeps record of the normal values associated with all the 
intermediate events and the TOP event. These values are stored in a 
file designated by the user. The user has to input the initial and 
final values of the input signal* the type of function he wants * the 
time-step required for computation and the total time of run. The 
timer routine associated with the Run module keeps track of the 
simulated time and displays it. 

The Fault option involves several sub options involving the 
creation* import* saving* editing and running of the faulted systems. A 
library of routines for these faults have been created covering all 
the series and feed-back * feed-forward path and nuclear interface 
components and the user has to specify the codeC a two character 
string} for the fault concerned* the particular component* and the 
timeCw. r.t the simulator clock} at which he wants the fault to 
occur. The routines involving building* editing* saving and retrieving 
of faulted system details are supported by detailed 
instructions. However * the Run module for fault mode involves special 
consideration. 1 1 now not only computes the system response * but 
also compares the values with the normal ones and generates a 
reportC report filename to be provided by the user} stating which 
are the nodes* andXor branches at which it has found changes. 

The simulator output has now got to be interfaced with the 
diagnosis soft aware. This is done by a batch file * SETTUP * which 
contains a number of filesC2 actually} which do the decision 
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making* Tnaich the changed par&m^i#re with the fault template for 


the system 

and tel 1 the 

user 

which 

of 

the 

i ntermedi ate 

eventsC havi ng 

i netrumentation} 

have 

occurred. It 

also 

mai ntai ns a 

'teller' window which can be 

accessed by 

the 

user 

to know the 


status of any other basic or iniermedi ate event . The format of 
setting up the interface is * £etiup< r~por t fi Zertome> < template 
fi l&nam&> . 

A. 2. 2. BUILDING THE KNOWLEDGE BASE FOR DIAGNOSIS. 

This involves telling the diagnosis software what are the 
intermediate events * what are the br sic events * what are the failure 
rates associated with each component or what are the components 
associated with each basic event and what are the causal relation 
ships between the intermediate events and the correspond! ng basic 
events. This is performed in steps and each step is accompanied by 
detailed instructions. For calculating the equivalent failure rates 
associated with each basic event the failure rate data of some of 
the components is incorporated in a da t abas eC * MAST. DAT* D which must 
remain in the same directory as the interface software. The database 
has been prepared using [23 and [S3 -For preparing the interface* the 
command format is *INTERFC</i lenome > where fi Lenome is the name of 
the family of files which store the knowledge base. The knowledge 
base is stored in © files of the form fi Lenctme. # . 
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A. 2.3. RUNNING THE DIAGNOSIS SOFTWARE. 


For running the di agnosia software » the batehfile DIAG is to be 
invoked. The format is DI AG</i Lenome> where / i lc-rucah& is the same as 
that used while interfacing. The diagnosis software is based on the 
algorithm outlined in chapter three and is accomplished in steps.lt 
also records the components which are being replaced and at what 
time for future reference. The process is highly user interactive and 
every step is provided with accompanying messages. 

Originally it was contemplated to run the simulator ,the setting 
up software and the diagnosis software on the same computer using 
standard software packages like MS Wi ndows . However , thi s is not 
being done for the present version af the software and the entire 
software has to run on two computers, the setting up software on 
one and the diagnosis on the other . Br i dgi ng of the two has not 


been done. 
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